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Abstract 

We investigate the small deviation probabilities of a class of very smooth stationary 
Gaussian processes playing an important role in Bayesian statistical inference. Our 
calculations are based on the appropriate modification of the entropy method due to 
Kuelbs, Li, and Linde as well as on classical results about the entropy of classes of 
analytic functions. They also involve Tsirelson's upper bound for small deviations 
and shed some light on the limits of sharpness for that estimate. 

1 Introduction 

Let X(t) be a centered stationary Gaussian process identified by its spectral measure 
F(du). We restrict X on the interval [0, 1] and evaluate its small deviations with respect 
to the uniform norm || • in terms of the small deviation function 

^(X,r) = -bgP(||X|| 0O <r), r^O. 

See jH] , [9] for many motivations for the study of small deviations and [10] for a complete 
bibliography on this subject. 

In this note, we will be interested in the case of rather smooth processes. Namely, 
consider the family of processes X u corresponding to absolutely continuous spectral mea- 
sures 

F u (du) = exp{ — \u\ u }du, < v < oo, 
and a parallel family of periodic processes X v corresponding to discrete spectral measures 

F v (du) = exp{ — l^l^j^Trfc, < v < oo. 

k=—oo 

The most interesting cases are v = 1 (exponential spectrum) and v = 2 (normal spec- 
trum) . 

For exposition completeness, let us close the first family with 

F^du) = l[-i^]du. 



1 



Although the smoothness properties of X v and X v are the same, it turns out, quite 
surprisingly, that their small deviations behave differently. 

An important motivation for this research comes from the recent work of A.W. van der 
Vaart and J.H. van Zanten [13], where such small deviations were considered in the context 
of Bayesian statistics. It was shown that they actually determine posterior convergence 
rates in nonparametric estimation problems. In particular the process X 2 , which is known 
in the Bayesian and machine learning literature as the "squared exponential process" , is a 
popular building block in the construction of prior distributions on functional parameters, 

cf. e.g. m- 

Before we state the results, let us fix some notation. We write /(•) H <?(•) or g(-) y /(•) 
if limsup^ < oo, while the equivalence / ~ g means that we have both / ^ g and 

9 di f ■ Moreover, /(•) < g(-) or g(-) > /(•) mean that limsup^ < 1. Finally, the strong 

equivalence / ~ g means that lim ^ = 1. 

It was shown in [14J, by using the RKHS-entropy method, that 

<p(X v ,r) z< | log rf, u>l. (1.1) 
We will slightly improve this and obtain sharp bounds. Our main results are as follows. 
Theorem 1.1 We have 



<p(X v , r) w ' , ' K v < oo, (1.2) 
log | iogr| 

and 

<p(X u ,r) « | logr| 1+ -, < 1/ < 1, (1.3) 

as r — > 0. 

For the periodic processes the asymptotics is somewhat different. 
Theorem 1.2 We have 

(p(X u ,r) ps | logr| 1+ ^, u>0, (1.4) 

as r — > 0. 



Remark 1.3 The exponential discrete spectrum (u = 1) is well understood for L2-norms 
where the estimate 

ip(X u ,r) ~ C\\ogr\\ u = l, 

(and even more precise behavior) is obtained in the context of small deviations of the 
series (with exponentially decreasing coefficients), see [I], [3], or [2]. As usual (but not 
always), the small deviation rate is the same for the uniform and for the L2-norm. 

Remark 1.4 The radical difference of the two bounds fll.2jl and (11.41) is that the first 
one does not depend on v while the second one does. From this point of view, Theorem 
11.11 provides a more surprising result than Theorem 11.21 
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The authors were informed by A.I. Nazarov that the same phenomenon is well known 
for many years in the theory of integral operators. Generally speaking, smoother the 
kernel of a symmetric integral operator is, faster the eigenvalues decrease. However, there 
is a kind of barrier: the eigenvalues A& can not decrease faster than logA^ ~ — nlogn. 
Since behavior of the eigenvalues is tightly related to small deviations (once we consider 
the covariance operator of a Gaussian process) in L 2 -norm, the bound for eigenvalues 
transforms in a bound for small deviations. 

Remark 1.5 Notice that we do not have any general tools for tracing connections be- 
tween small deviations for discrete and continuous spectra. The general feeling is that 
discrete spectrum provides larger small deviation probabilities. 

In view of the applications in Bayesian nonparametrics we also provide upper bounds 
for the small deviations of rescaled versions of the processes X v . For a constant c < 1, 
define the rescaled process X^ by setting X£(t) = X u (t/c). 

Theorem 1.6 For all c < 1 we have 

I I loser 1 2 

^,r)j- ' ' „ ">1, (1.5) 



<p(XZ,r) r< -| logr| 1+ - , 0<u<l, (1.6) 



as r — > 0. 



2 RKHS tools 

In this section, we recall a powerful approach to the study of Gaussian small deviations 
based on the entropy of the corresponding kernel (RKHS), suggested by J. Kuelbs and 
W. Li in [Bj. In the literature, this approach is mainly applied to polynomial entropy, 
resp. small deviation function, while the results we need should handle slowly varying 
functions. Therefore, for the reader's convenience, we give here the complete proofs. 

We work in a fairly general setting. Let X be a centered Gaussian vector in a separable 
Banach space (E, || • ||). Then X generates a kernel, or RKHS, TC which is a linear subspace 
of E equipped with the structure of a Hilbert space. For a detailed description of the 
RKHS we refer to [9]. We denote by TC\ the unit ball of TC. Let the covering number N(r) 
be defined as the minimal number of balls in the norm || • || of radius r that is needed to 
cover TC\. Furthermore, let H(r) := log N(r) be the corresponding metric entropy of TC\. 

We still study the the behavior of small deviation function 

tp{r) := <p(X,r) := -logP(||X|| < r), r -> 0. 
Let us recall the central inequalities proved in [6j. 
Lemma 2.1 Let r > and A > 0. Then 

H (y) < tp{r) + A 2 /2, 
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H(£) > cp(2r) + log$(A + a r ), 

where $ is the distribution function of the standard normal law, and a r is defined by 
-log$(a r ) = <p(r). 

We obtain the following corollary from the first inequality in the case of a slowly 
varying entropy or small deviation function. 

Corollary 2.2 Let [3 be any real number and 7, C > 0. Then 

• V^ 7 ") ~ C| logr| 7 (log I logrl)' 3 implies H{r) < C\ logr| 7 (log | logr]) 13 . 

• H{r) > C\ logr| 7 (log | logrj)^ implies ip(r) > C\ logr| 7 (log | \ogr\Y . 
The relations also hold if < and > are replaced by -< and y, respectively. 

Proof: Simply set A = 2 in the first inequality in Lemma [2.11 □ 

The arguments are slightly more involved when using the second inequality because 
of its implicit nature. First recall that 

log$(x) ~ -x 2 /2, 

as x — > — 00. This helps to simplify the second inequality in Lemma [2.11 

Lemma 2.3 Let X = A(r) > be a function such that A(r) < yV2ip(r). Then, as r — ► 0, 

H (£) > <p(2r) - \ (A - ^W)f. (2.1) 

The usual choice in the regularly varying case is A = — a r ~ ^2(p(r), which also works 
in the case of slow variation. The result reads as follows. 

Corollary 2.4 Let (3 be any real and 7, C > 0. Then 

• H(r) < C\ logr| 7 (log | logr|)^ implies f(r) < C\ logr| 7 (log | \ogr\)P . 

• Assume that there is a constant K > such that (p(r/2) < Kip(r) for all r 6 (0, 1). 
Then 

ip(r) > C\ logr| 7 (log I logrl)' 3 implies 

H(r) > C(l + logK/(21og2))" 7 | logr| 7 (log | logr|)^. 

The relations also hold if < and > are replaced by ^ and y, respectively. 

Proof: Let A := y^2(p(r). For the first implication note that the assumption for H ', 
relation fl2.ll) . and the fact that r/A — > imply that 

C\ logr - log v^WrOog I \ogr/y^)\f > <p(2r). (2.2) 
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The assumption for H furthermore implies that H(r) ^ r r ' for any r' > 0. By 
Proposition 2.4 in [7j, this yields 

(p(r) ^ r~ T , for any r > 0. (2-3) 

Thus 

log yVW ^ T r I log -r | 

hm sup — , — < — hm sup 



Also (12.31) implies that (log | logr/ y^p{r)\Y can be replaced by (log | logrj)^ in (12.21) . 
Therefore 



1 • f | log 7- "log V^Hr ^ I i h/3 

— < hminf — (log logr ) p 

G r->0 V^r) 



lim inf 

r->0 



logr | logy^r) 



^(2r)V7 y3 (2r) 1 /7 



(log I logr | Y < [l + - J linnnf (log | logr|)^. 



Letting r — > yields the assertion. 

Let us come to the second implication. We may assume that K > 1. First note 
that the regularity assumption on ip implies that tp(r) < K'r~ h with h = log Kj log 2, 
if' := <p{l)K and all < r < 1. Now if ip(r) > C\ logr| 7 (log | \ogr\Y, we obtain by 02.11) 
that 

^ (^) > C'l logrr(log | logrl)' 3 . 



We set r' := r/A. We obtain, by the assumption on ip that r' > r 1+h ' 2 / ' \[2K' '. Therefore, 

H (r 1+h/2 /V2K^ > H{r') = H (£) > C\ logr| 7 (log | logr|)^. 
In other words, 

H(r) > C\ log r 1 ^ 2 ) | 7 (log | logr^ = - — £ | logr| 7 (log | logr|f . 

□ 



Remark 2.5 Note that, as in the regularly varying case, one needs to know something 
about the maximal increase of tp in order to translate a lower bound for ip into a lower 
bound for H. If it is already known that ip behaves logarithmically, then the assumption 
holds for any K > 1 and one also obtains strong asymptotic equivalence. 

As a particular case of Corollaries 12.21 and 12.41 we obtain the following. 
Corollary 2.6 Let (3 be any real and 7 > 0. Then 

(p(r) « I logr| 7 (log I logr])' 3 H(r) »s | logr| 7 (log | logr|) . 
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3 Entropy of stationary RKHS 



Let now X be a complex valued centered stationary Gaussian process with spectral mea- 
sure F and continuous sample paths. We consider X as a random element of E = C[0, 1]. 
It is well known (see e.g. [U]) that the RKHS Ji admits the following representation: h G 7^ 
iff 

/oo 
£(u)e- itu F{du), £eL 2 (R,F), (3.1) 
-oo 

for < t < 1 and 

IN* = inf||<lk* 

where infimum is taken over all i satisfying (13.11) . In particular, h G TCi (here, as above, 
Hx is the unit ball in H) iff flO) holds with £ such that ||£|| 2 , F < 1. 

Now we specify this general scheme to the processes we are interested in and evaluate 
the entropy. 



3.1 Continuous spectrum 

Let now F(du) = f u (u)du, where f u (u) = e - '"'", v > 0. We prove the following. 
Proposition 3.1 For any v > 1 it is true that 

H(Hi,£) 





logej 


2 


log 


log 





Proof: 

Upper bound. Let h G TCi. Then the representation (13.11) holds with some i such that 



W)= / \£(u)\ 2 f„(u)du<l. 

J — oo 

Clearly, h turns out to be an entire analytic function well defined on C by the same 
expression (13.11) and by Holder's inequality 

/oo / roo \ 1/2 

e \ l ^W u \\i{u)\f v {u)du < I / e 2|Im(2)l|u| ^(M)rfM :=M„(2|Im(z)|). 
-oo \J — oo / 

Since 

1 r°° (v — 1 W^ -1 ) 
log MJr) = - log / e rluHul "du ~ ^ , - 7 — r , — , as r -> oo, 

it follows that 

< Af„(2| Im(z)|) < d exp{C 2 | lm{z)\ u/ ^- 1] }, V/i G Fi, * G C, (3.2) 

with appropriate constants Ci = C*i(z/), C 2 = C 2 (z/). It is known from Theorem XX of 
[5 J that the entropy of the class of all entire analytic functions A(Ci, C 2 , v) satisfying the 
even weaker condition 

IM*)I < °i exp{C 2 |zr /(,/_1) }, V^GC, (3.3) 
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verifies 

H{A(C u C*v),e)- |l ° g - 



log | log £:| 

Since 7i\ C A(C\, C 2 , we obtain 



H(Hi,e) r< 





loge 


2 


log 


log 





Lower bound. Here we will only need an inequality 

/(«) > c /s |«| < 1, (3.4) 

for a constant Cf > 0, which is fulfilled for all densities f u , v > 0. 

We start with a construction of an auxiliary function and study its properties. Take 
any 7 G (0, 1) and let a sequence (a,k)k>i De defined by a k = ck~ x ~^ and normalized so 
that J2h=i a k = 1- Let 



00 



Since 



we have 



n snna^ ^ 

t=i akZ 



\z\ c — ' 7 



OO 



|G(0)|<exp^a fe |z| 1 = eK (3.5) 

The function G is rapidly decreasing on the real line. Namely, for any large t G R choose 
a positive integer k = K(t) such that a K \t\ ~ 2, i.e. k ~ (c|t|/2)~. Then 

K 

\G{t)\ < J] \a k t\~ l < 2- K < exp (-C G \t\&) (3.6) 
fe=i 

with appropriate Cq > 0. Finally, notice that 

9 G := inf \G(t)\ >0, (3.7) 

0<t<l 

since the smallest zero of G is attained at - > 71 > 1. 

c 

Now we start the entropy estimation. Consider a class of analytic functions ip on 
complex plane satisfying 

\ip(z)\ < Kexp{\z\ 1/2 }, zeC. (3.8) 
Again by Theorem XX in [5] it is true that 



'K, 





loge| 


2 


log 


log 





(3.9) 



Next, consider a class of functions 

B K = {b: b(z) = ip(z)G(z), ^e^ K ,ze C}. 

With a minor abuse of notation, we do not distinguish the functions from Bx and their 
restrictions on [0,1]. Clearly, 

3-1 





loge| 


2 


log 


log 





H(B K , e) > H(S> K , 6 G L e) t ' ? ' , . (3.10) 



We will show now that for an appropriate choice of the parameter K it is true that 
B K dH x . Let be B K . Then by (J^HJ and ([33]) we have 

\b(z)\ < Kexp{\z\ + \z\ 1/2 } . 

Moreover, by (GEE]) and (J3J2) 

/oo 
exp |2|t| 1 / 2 - 2C G \t\^\ dt := K 2 C^ 2 < oo. 
oo 

By using these two properties, it follows from the classical Paley-Wiener theorem ( [T2] 
or [1], Chap. IV) that the Fourier transform 



oo -r poo 



b{u) = -= / e lut b(t)dt = -= I e lut ij(t)G(t)dt 

V27T J_oo V27T 



vanishes outside the interval [—1, 1]. 
On the other hand, we can write 



1 r 1 

b(t) = -= \ e- mt b[u)du 



2tt y_i 

1 ^ c-** M /(«)** 



2tt 7_i /(u) 

e-^(w) /(u)du. 

It remains to show that ||^||2,f < 1- By using (13.41) we have, indeed, 



1 f'^iu 



' 2W-1 f(u) 

1 1 K 2 C 2 

< \\b\\ 2 rm = \\b\\lm< — <1, 

- 2nc ^\\ ml 2 (r) 27TC/" ML2(R) ~ 2nc f - ' 

whenever .K" is chosen sufficiently small (depending on Cf). Thus B^ C 7ii and we obtain 
fromdSinD 

H(n l ,e)>H(B K ,e)h 

as required. □ 
For small values of v we only need the following upper bound. 





loge 


2 


log 


log 
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Proposition 3.2 For any v < 1 it is true that 

H(Hi,e) r< \ \oge\ 1+1/u . 

Proof: The idea is simple: for v — 1 the result is already known from ( 11.11) and we reduce 
the general case to that one by truncation of the spectral measure. Namely, for any e > 
let v = (3 1 loge]) 1 ^. Then by ( 13. ip the elements of Ji\ have the form 

h{t) = (! + f J £(u)e- itu F(du) := h v (t) + h v (t), \\£\\ 2>F < 1. 

\./|u|<u J\u\>vJ 

By the choice of v we have 

\h v (t)\ < \\£\\ 2>F ( ! exp(-\u\ u )du) ' < C "exp(-\v\ v /2)v {1 - u)/2 < e 

\J\u\>v J 

for small e. Therefore, we only need to study the entropy of the set 7i\ := {h v , h £ TCi}. 
This will be done by means of the following result from [5] in the quantitative version of 
[Ml. Lemma 2.3. 



Lemma 3.3 Let F be a spectral measure and let a positive 5 < 1 be such that 

I := J e SH F(du) < 1. 



Then 



2 

H(Hi,e) < C ' ""' 



logs | 



S ' 

where C is a numeric constant. 

First, notice that if we drop the assumption J < 1, then by scaling reasons we still have 

H(n u e)<C ^^l , (3.11) 
o 

Apply this bound to our truncated measure e~^l\ u \< v du and 5 = 9\ logel 1 " 1 ^ with 
appropriately small parameter 6 < 3 -1 ^. Notice that 5\u\ < \u\ u whenever \u\ < v. Hence 

I=[ e 5 \ uHu \ u du < 2v w \\oge\ 1/l/ . 

J\u\<v 

We obtain from (13. lip 

" (H ^n^F^ =|1 ° E£|1+1/ "' 

as required. □ 
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3.2 Discrete spectrum 

Let now F(du) = YlkL-oo ex P{ — \k\ u }82irk, v > 0. We prove the following. 

Proposition 3.4 For any v > it is true that 

H(Hi,e) z< \ \oge\ 1+1/v . 

Proof: The reasoning goes along the same lines as that of the upper bound in the previous 
proposition. Let h G 7i\. Then the representation (13.1 1) means that 

oo 

h{t) = 4e _ifct ~ |fcr (3.12) 

k=— oo 

with some £ — (£%) such that 

ii^iiW) = Ei^i 2ex p{- r >^ 1 - 

k 

Clearly, h turns out to be a periodic entire analytic function well defined on C by the 
same expression (13.121) and by the Holder's inequality 

oo / oo \ V 2 

\h(z)\< el^WH*!-!*!"^! < Yl '= M v (2\lm(z)\). 

k=—oo \k=— oo / 

It follows again that 

\h(z)\ < C x exp{C 2 \Im(z)\»^ u -V}, VheH u zeC, 

with appropriate constants C\ = Ci(v), C 2 = Ciiy)- It is known by Theorem XXI in [5] 
that the entropy of the class of all periodic entire analytic functions A(Ci, C 2 , v) satisfying 
this condition verifies 

H(A(C 1 ,C 2j v) J e)^\\oge\ 1+1 ^. 

Hence 

fT(Wi,e)=<|loge| 1+1 ^. 

□ 



4 Proofs of main results 

Proof of Theorem II. 2i The lower bound for small deviations follows immediately 
from Proposition 13.41 and Corollary 12.41 

For getting the upper bound we implement a simple but ingenious idea of B.S. Tsirelson 
initially designed for continuous spectra in [TT]. Let I be an integer. Let us consider an 
auxiliary centered stationary Gaussian process Y = Yi(t) with the spectral measure 

F Y (alu) = exp{-r} ^ 

fc|<Z 
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which minorates F u . By the standard Anderson argument 

l^lloo < r) < Fdl^lU < r) Vr>0. 



The covariance of Y is 

|fct " (4.1) 
while for the variance we have 

a 2 := E\Y(t)\ 2 = exp{-Z"} (21 + 1). (4.2) 

Define a grid step A = 2tx/{21 + 1). Observe from ( 14. II) that (K(&A))fc 6 z is a centered 
Gaussian non-correlated, hence independent, sequence with variance ( 14.2ft . For any r > 
we get the bound 

m\Y\\oo<r) < P( sup \Y(kA)\<r) 

0<Ai<l/A 

/ / r\ 1 / A /r \ (2«+i)/27r 

< F(a\N\ < rY' A < (^2^ -J < (-J 

(2«+1)/2tt 

<(rexp{r}) (2m)/2 -. 



/I log 




U4 


i / 



,(2Z + l)exp{-/>} 
Next, an elementary optimization suggests to set 

l/r 

I Ml?-' / I \ 

whereas 

tp(X v ,r) > tp(X,r) > log(rexp{r}) ~ - (|logr| -T 

271 7T 

vl v \log.r\ 1+1 / u 

\ogr\ 



7r(z/+ 1) ' b ' 7T(l/ + l) l) 1 /" 



7/ 



logr |l+l/^ 



7T(Z/+ 1)1+ V" 

and we arrive at the desired estimate. □ 

Proof of Theorem ll.lt 

For v > 1 the result follows immediately from Proposition 13.11 and Corollary 12.61 
For v < 1 the necessary upper bound follows immediately from Proposition 13.21 and 

Corollary El 

The necessary lower bound 



(X„r) y |logr| 1+1/ ^ (4.3) 
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holds for any v > and can be obtained by Tsirelson's method, as described above. In 
this case, for any positive I we consider an auxiliary centered stationary Gaussian process 
Y = Yi(t) with the spectral density 



f Y (u) = exp{-r} l| tt |< { , 

which minorates f v . Define a grid step A = 2p. It is easy to see again that (Y(kA))k£z 
is a centered Gaussian non-correlated, hence independent, sequence with variance 

a 2 :=E|F(t)| 2 = 2/exp{-r}. 

and the final calculation leading to (I4.3p goes through exactly as above. □ 

Remark 4.1 We see from Theorem 11.11 that the estimate (14. 3p is not sharp for v > 1. 
This is rather surprising since in the previously known examples (e.g. for polynomially 
decreasing spectral densities in (TTJ) Tsirelson's method always returned the right rates. 

Proof of Theorem II. 61 

We prove (11.61) . the proof of (11.51) is identical. Clearly, 

P(||X, c || 00 <r)=p( sup \X v (t)\<r). 

v te[o,i/c] J 

Let 7i\ be the unit ball of the RKHS of the process X v viewed as random element in 
C[0, 1/c], i.e. the class of functions on [0, 1/c] of the form 

/oo 
£(u)e- Uu dF u (u), \\£\\ 2 , Fu < 1. 
-oo 

Let n be the smallest integer larger or equal to 1/c. Observe that if h G T~L\, then for 
k — 0, . . . , n— 1, the function 1 1— > h(k+t) on [0, 1] belongs to the unit ball Tii of the RKHS 
of the process X u on [0, 1]. Hence, if hi, ... , is an e-net for TCi, then the functions of 
the form 

n-l 
fc=0 

form an e-net for 7i\. There are at most N n such functions. We keep only those for which 
there exists an element of Til at uniform distance at most e. The mentioned elements 
form a 2e-net for 1~L\. It follows that 

H{Hl,2e) < nH{H x ,e) < - H{H x ,e). 
Now apply Proposition 13.41 and Corollary 12.41 to arrive at (11.61) . □ 
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5 An open problem 



It would be very interesting to extend our results to more general classes of smooth 
processes. Since Tsirelson's bound is sharp for spectral measures F u , < v < 1, and 
in the case of polynomial spectral density f{u) ~ |w|~ 1-/3 it is also known to give a 
sharp bound (p(X,r) « r ^ 2 ^ , it is natural to conjecture that this bound is sharp in all 
intermediate cases, too. Our methods provide some reasonable bounds for general case 
but they should be at least enhanced in order to solve it properly. For example, on the 
test family of intermediate processes Y a with spectral densities 

f a (u) = exp{-(log + \u\) a }, a>l, 

we get 

|logr|^ exp{(2|logr|) 1/Q } z< (p(Y a ,r) r< | logrj exp |(2| logr|) 1/a + ^| logr| 2/ " x | , 
which is not as sharp as we would like. 
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